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Partition into heapable sequences, heap 
tableaux and a multiset extension of 
Hammersley's process 


Gabriel Istrate, Cosmin Bonchi§* 


Abstract 

We investigate partitioning of integer sequences into heapable 
subsequences (previously defined and established by Mitzenmacher 
et al. 

IBHMZlT] ). We show that an extension of patience sorting computes 
the decomposition into a minimal number of heapable subsequences 
(MHS). We connect this parameter to an interactive particle system, 
a multiset extension of Hammersley's process, and investigate its ex¬ 
pected value on a random permutation. In contrast with the (well 
studied) case of the longest increasing subsequence, we bring exper¬ 
imental evidence that the correct asymptotic scaling is • ln(n). 
Finally we give a heap-based extension of Young tableaux, prove a 
hook inequality and an extension of the Robinson-Schensted corre¬ 
spondence. 


1 Introduction 


Patience sorting llMal63ll and the longest increasing (LIS) sequence are well- 
studied topics in combinatorics. The analysis of the expected length of 
the LIS of a random permutation is a classical problem displaying inter¬ 
esting coimections with the theory of interacting particle systems IIAD99II 
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and that of combinatorial Hopf algebras IHivOZI . Recursive versions of pa- 
fience sorting are involved (under the name of Schensted procedure llSch61l l 
in the theory of Young fableaux. A wonderful recenf reference for the 
rich theory of the longest increasing sequences (and substantially more) 
is IIRoml4l . 

Recently Mitzenmacher et al. IBHMZlTI introduced, under the name 
of heapable sequence, an inferesfing variation on the concept of increas¬ 
ing sequences. Informally, a sequence of integers is heapable if it can be 
successively inserted into a (not necessarily complete) binary tree satisfy¬ 
ing the heap property without having to resort to node rearrangements. 
Mitzenmacher et al. showed that the longest heapable subsequence in a 
random permutation grows linearly (rather than asymptotically equal to 
2^/n as does LIS) and raised as an open question the issue of exfending the 
rich theory of LIS fo fhe case of heapable sequences. 

In fhis paper we parfly answer fhis open quesfion: we define a fam¬ 
ily MHSk{X) of measures (based on decomposing fhe sequence info sub¬ 
sequences heapable into a min-heap of arity at most k) and show that 
a variant of pafience sorting correctly computes the values of these pa¬ 
rameters. We show that this family of measures forms an infinite hier¬ 
archy, and investigate the expected value of parameter MHS 2 [t^], where 
TT is a random permutation of order n. Unlike the case k = 1 where 
E[MHSi[k]] = E[LDS[k\] ~ 2y/n, we argue that in the case k >2 the cor¬ 
rect scaling is logarithmic, bringing experimental evidence that the pre¬ 
cise scaling is E[MHS 2 [ti\\ ~ ^Inn, where 0 = is the golden ratio. 
The analysis exploits the coimection with a new, multiset extension of the 
Hammersley-Aldous-Diaconis process IIAD95I , an extension that may be 
of independent interest. Finally, we introduce a heap-based generaliza¬ 
tion of Young fableaux. We prove (Theorem below) a hook inequalify 
related to the hook formula for Young fableaux IFRT54I and Knufh's hook 
formula for heap-ordered trees BKnu981 , and (Theorem]^ an extension of 
the Robinson-Schensted (R-S) correspondence. 


2 Preliminaries 

For k > 1 define alphabef = {1,2,..., k}. Define as well Sqo = Ufc>iSfc. 
Given words x, y over Soo we will denote hy x the fact that a: is a prefix 
of y. The sef of (non-sfricf) prefixes of x will be denoted by Pref{x). Given 


















words X, 1/ e define the prefix partial order x :<ppo y as follows: If x ^ y 
then X :<ppo y. If x = za, y = zb, a,b & Sqo and a < b then x :<ppo y. :<ppo is 
the transitive closure of these two constraints. Similarly, the lexicographic 
partial order is defined as follows: If x C y then x :<iex y- If x = za, 
y = zb, a, b e Sqo and a < b then x :<iex y- :^iex is the transitive closure of 
these two constraints. 

A k-ary tree is a finite, ^ppo-closed set T of words over alphabet = 
{1,2,..., k}. That is, we impose the condition that positions on the same 
level in a tree are filled preferentially from left to right. The position pos{x) 
of node x in a k-ary tree is the string over alphabet {1,2,... ,k} encoding the 
path from the root to the node (e.g. the root has position A, its children 
have positions 1,2,... ,k, and so on). A k-ary (min)-heap is a function / : 
T — )■ N monotone with respect to pos, i.e. (Vx, y e T), \pos{x) C pos{y)] ^ 
[fix) < fiy)]. 

A (binary min-)heap is a binary tree, not necessarily complete, such that 
A[parent[x]] < A[x] for every non-root node x. If instead of binary we 
require the tree to be fc-ary we get the concept of fc-ary min-heap. 

A sequence X = Aq, ..., Xn-i is k-heapable if there exists some k-ary 
tree T whose nodes are labeled with (exactly one of) the elements of X, 
such that for every non-root node Xi and parent Xj, Xj < X, and j < i. fn 
particular a 2-heapable sequence will simply be called heapable IBHMZlTI . 
Given sequence of integer numbers X, denote by MHSkiX) the smallest 
number of heapable (not necessarily contiguous) subsequences one can decom¬ 
pose X into. MHSi{X) is equal iLP941 to the shuffled up-sequences (SUS) 
measure in the theory of presortedness. 

Example 1. Let X = [2,4,3,!]. Via patience sorting MHSi{X) = SUS{X) 
= 3. MHS 2 {X) = 2, since subsequnces [2,4,3] and [1] are 2-heapable. On the 
other hand, for every k >1, MHSki[k, k — 1,... ,1]) = k. 

Analyzing the behavior of LIS relies on the correspondence between 
longest increasing sequences and an interactive particle system iAD95i 
called the Hammersley-Aldous-Diaconis (shortly, Hammersley or HAD) pro¬ 
cess. We give it the multiset generalization displayed in Figure]^ Techni¬ 
cally, to recover the usual definition of Hammersley's process one should 
take Xa > Xt+i (rather than Xa < Xt+i). This small difference arises 
since we want to capture MHSkin), which generalizes LDSin), rather 
than LIS{tt) (captured by Hammersley's process). This slight difference 
is, of course, inconsequential: our definition is simply the "flipped around 








• A number of individuals appear (at integer times i > 1) as ran¬ 
dom numbers Xj, uniformly distributed in the interval [0,1]. 

• Each individual is initially endowed with k "lifelines". 

• The appearance of a new individual Xt+i subtracts a life from the 
largest individual Xa < Xt+i (if any) still alive at moment t. 


Figure 1: HADk, the multiset Hammersley process with k lifelines. 


the midpoint of segment [0,1]” version of such a generalization, and has 
similar behaviour). 


3 A greedy approach to computing MHSk 

First we show that one can combine patience sorting and the greedy ap¬ 
proach in IIBHMZITI to obtain an algorithm for computing MHSk{X). To 
do so, we must adapt to our purposes some notation in that paper. 

A binary tree with n nodes has n + 1 positions (that will be called slots) 
where one can add a new number. We will identify a slot with the minimal 
value of a number that can be added to that location. For heap-ordered 
trees it is the value of the parent node. Slots easily generalize to forests. 
The number of slots of a forest with d trees and n nodes is n -I- d. 

Given a binary heap forest T, the signature ofT denoted sig{T), is the 
vector of the (values of) free slots in T, in sorted (non-decreasing) order. 
Given two binary heap forests Ti, T2, Ti dominates T 2 if \sigTi \ < \sigT 2 1 arid 
inequality sigx^ [i] < sigT 2 [t] holds for all 1 < i < \sigT^ |. 

Theorem 1. For every fixed k > 1 there is a polynomial time algorithm that, 
given sequence X = (Aq, ..., X„_i) as input, computes MHSk{X). 

Proof. We use the greedy approach of Algorithm |3.1[ Proving correctness 
of the algorithm employs the following 






Algorithm 3.1: GREEDY(Vr) 

INPUT IV = (wi,W 2 ,..., Wn) a list of integers. 

Start with empty heap forest T = 0. 
for i in range (n): 

if (there exists a slot where Xi can be inserted): 

insert Xi in the slot with the lowest value 

else : 

start a new heap consisting of Xj only. 


Lemma 1. Let Ti, T 2 be two heap forests such that Ti dominates T 2 . Insert a new 
element x in both Ti and T 2 : greedily in Ti (i.e. at the largest slot with value less 
or equal to x, or as the root of a new tree, if no such slot exists) and arbitrarily in 
T 2 , obtaining forests Tf respectively. Then T[ dominates Tf 

Proof Eirst note that, by domination, if no slot of Ti can accomodate x 
(which, thus, starts a new tree) then a similar property is true in T 2 (and 
thus X starts a new tree in T 2 as well). 

Let -sipT^ = (ai, 02 ,...) and sigT 2 = (&i, & 2 , • • •) be the two signatures. 
The process of inserting x can be described as adding two copies of x to 
the signature of Ti(T 2 ) and (perhaps) removing a label < x from the two 
signatures. The removed label is Oj, the largest label < x, in the case of 
greedy insertion into Ti. Let bj be the largest value (or possibly none) in 
T 2 less or equal to x. Some bk less or equal to bj is replaced by two copies 
of a; in T 2 . The following are true: 

• The length of sigT[ is at most that of sig^. 

• The element bk (it any) deleted by x from T 2 satisfies bk < x. Its index 
in T 2 is less or equal to i. 

• The two x's are inserted to the left of the deleted (it any) positions in 
both Ti and T 2 . 

Consider some position / in .sigTp Our goal is to show that a\ < b[. 
Several cases are possible: 

• I < k. Then a'l = ai and b[ = bi. 

• k < I < j. Then a'l = ai and b'l = bi+i > ai. 
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Figure 2: The argument of Lemma [^ Pictured vectors (both initial and 
resulting) have equal lengths (which may not always be the case). 


• i<l<i + k — 1. Then a[< x and h'l > x. 

• I > i + k — 1. Then a'l = ai_k+i and h'l = 6;_fc+i. 


□ 

□ 

Let X be a sequence of integers, OPT be an optimal partition of X 
into fc-heapable sequences and T be the solution produced by GREEDY. 
Applying Lemma repeatedly we infer that whenever GREEDY adds a 
new heap the same thing happens in OPT. Thus the number of heaps 
created by Greedy is optimal, which means that the algorithm computes 
MHSk{X). □ □ 

Trivially MHSk{X) < MHSk-i{X). On the other hand 

Theorem 2. The following statements (proved in the Appendix) are true for every 
k > 2: (a), there exists a sequence X such that MHSk{X) < MHSk-i(X) 
<...< MHSi(X); (b). sup [MHSk-iiX) - MHSkiX)] = oo. 

X 







































































4 The connection with the multiset Hammersley 
process 

Denote by MinHADk{n) the random variable denoting the number of times 
i in the evolution of process HADk up to time n when the newly inserted particle 
Xi has lower value than all the existing particles at time i. The observation 
from IIHam72[|AD951 generalizes to: 

Theorems. For every fixed k,n>l D^g5^[MifS'fc(7r)] = E[MinHADk{n)]. 

Proof Sketch: W.h.p. all Xfs are different. We will thus ignore in the 
sequel the opposite alternative. Informally minima correspond to new 
heaps and live particles to slots in these heaps (cf. also Lemma [^. □ 


5 The asymptotic behavior of E[MHS2[7v 


The asymptotic behavior of E[MHSi[Tr]] where vr is a random permuta¬ 
tion in Sn is a classical problem in probability theory: results in IIHam721 . 
ITS771 , IVK77I , ||AD95i show that it is asymptotically equal to 2y/n. 

A simple lower bound valid for all values of /c > 1 is 


Theorem 4. For every fixed k,n >1 

En&s„[M E[ Skirr)] > H^, the nth harmonic number. 


( 1 ) 


Proof For tt G S'„ the set of its minima is defined as Mintyx) = {j G [n] : 
7r[j] < 7r[z] for all 1 < i < j} (and similarly for maxima). It is easy to see 
that MiFS'fc[7r] > |Mm[7r]|. Indeed, every minimum of vr must determine 
the starting of a new heap, no matter what k is. Now we use the well- 
known formula E^^^Sn [|Mm[7r]|] = D^65„[|Max[7r]|] = Hn llKnu98J. □ 


□ 

To gain insight in the behavior of process HAD 2 we note that, rather 
than giving the precise values of Aq, Ai, ..., X* G [0,1], an equivalent ran¬ 
dom model inserts Xt uniformly at random in any of the t + 1 possible 
positions determined by Xq, Xi, ..., Xt-i- This model translates into the 
following equivalent combinatorial description of HADk. word Wt over 
the alphabet {—1,0,1,2} describes the state of the process at time t. Each 















Wt conventionally starts with a —1 and continues with a sequence of 0,1's 
and 2's, informally the "number of lifelines" of particles at time t. For in¬ 
stance Wo = 0 , wi = 02, W 2 is either 022 or 012, depending on Xo <> Xi, 
and so on. At each time t a random letter of Wt is chosen (corresponding to 
a position for Xt) and we apply one of the following transformations, the 
appropriate one for the chosen position: 

• Replacing —10^ by —10''2; This is the case when Xt is the smallest 
particle still alive, and to its right there are r > 0 dead particles. 

• Replacing 10^ by 0''+^2; Suppose that Xa is the largest live label less 
or equal to Xt, that the corresponding particle Xa has one lifetime 
at time t, and that there are r dead particles between Xa and Xt. 
Adding Xt (with multiplicity two) decreases multiplicity of Xa to 0. 


• Replacing 20^ by 10’'2; Suppose that Xa is the largest label less or equal 
to Xt, its multiplicity is two, and there are r > 0 dead particles be¬ 
tween Xa and Xt. Adding Xt removes one lifeline from particle Xa. 


Simulating the (combinatorial version of the) Hammersley process with 
two lifelines confirms the fact that E[MH 82 ( 71 )] grows significantly slower 
than E[MH 81 ( 71 )]: The x-axis in the figure is logarithmic. The scaling is 
clearly different, and is consistent (see inset) with logarithmic growth (dis¬ 
played as a straight line on a plot with log-scaling on the x-axis). Experi¬ 
mental results (see the inset/caption of Fig. suggest the following bold 


Conjecture 1. We have lim„_,oo = 0/ "^Hh 0 = the golden ratio. 

More generally, for an arbitrary k >2 the relevant scaling is 

E[MH82[7i]] 1 


lim 

n^oo 


ln(n) 


( 2 ) 


where (fk is the unique root in (0,1) of equation X^ — X^ ^ A = 1. 

We plan to present the experimental evidence for the truth of equa¬ 
tion (|2]) and a nonrigorous, "physics-like" justification, together with fur¬ 
ther insights on the so-called hydrodynamic behavior IIGro02ll of the HADk 
process in subsequent work ilBlSIl . For now we limit ourselves to showing 
that one can (rigorously) perform a first step in the analysis of the HAD 2 
process: we prove convergence of (some of) its structural characteristics. 
This will likely be useful in a full rigorous proof of Conjecture]^ 

Denote by Lt the number of digits 1-1-2, and by Ct the number of ones 


in Wt. Let l(t) = E[^], c(t) = D[L^]. l(t), c(t) always belong to [0,1]. 


rC(t) 











Figure 3: Scaling of expected value of MHSk[Ti] for fc = 1,2. The inset 
shows E[MHS2[7 t]] (red) versus (j) ■ lrL(?7,) + 1 (blue). The fit is strikingly 
accurate. 


Theorem 5. There exist constants l,c e [0,1] such that l{t) —)■ I, c{t) —>■ c. 


Proof Sketch: 

ties El Oj. 


We use a standard tool, subadditivity: if sequence a„ satis- 
m + On for all m, n > 1 then (by Fekete's Lemma ( IISte97B pp. 


3, |Szp01|) lim„^.oo On/n exists. We show in the Appendix that this is the 
case for two independent linear combinations of and c{t). □ 

Experimentally (and nonrigorously) I = cj) — 1 = and c = 
"Physics-like" nonrigorous arguments then imply the desired scaling. An 
additional ingredient is that digits 0/1/2 are uniformly distributed (condi¬ 
tional on their density) in a large Wt. This is intuitively true since for large 
t the behavior of the HADk process is described by a compound Poisson 


process. We defer more complete explanations to IIIB15I . 


6 Heap tableaux, a hook inequality and a gen¬ 
eralization of the Robinson-Schensted Corre¬ 
spondence. 

Finally, we present an extension of Young diagrams to heap-based ta¬ 
bleaux. All proofs are given in the Appendix. A (k-)heap tableau T is 





































































fc-ary min-heap of integer vectors, so that for every r G the vector 
Vr at address r is nondecreasing. We formally represent the tableau as 
a function T : S^xN -> NU{±} such that (a). T has finite support: 
the set dom{T) = {(r, a) : T{r,a) of nonempty positions is finite, 
(b). T is 'O-nondecreasing: if T{r,a) fl. and q \Z r then T{q,a) ^1. and 
T{q^ a) < T{r, a). In other words, T(-, a) is a min-heap. (c). T is columnwise 
increasing: if T{r,a) and b < a then T{r,b) and T{r,b) < T{r,a). 
That is, each column W is increasing. The shape ofT is the heap S{T) where 
node with address r holds value |14-|- 

A tableau is standard if (e). for all 1 < i < n = \dom{T)\, |T“^(i)| = 1 
and (f). If X <iex y and T(y, 1) 7^_L then ±7^ T{x, 1) < T{y, 1). I.e., labels in 
the first heap Hi are increasing from left to right and top to bottom. 

Example 2. A heap tableau Ti with 9 elements is presented in Fig. (a) and as 
a Young-like diagram in Fig.^(b). Note that: (i). Columns correspond to rows 
ofTi (ii). Their labels are in rather than N. (in). Cells may contain _L. (iv). 
Rows need not be increasing, only min-heap ordered. 


One important drawback of our notion of heap tableaux above is that 
they do not reflect the evolution of the process HADk the way ordinary 
Young tableaux do (on their first line) for process HADi via the Schen- 
sted procedure ISch611 : A generalization with this feature would seem 
to require that each cell contains not an integer but a multiset of integers. 
Obtaining such a notion of tableau is part of ongoing research. 

However, we can motivate our definition of heap tableau by the first 
application below, a hook inequality for such tableaux. To explain it, note 
that heap tableaux generalize both heap-ordered trees and Young tableaux. 
In both cases there exist hook formulas that count the number of ways to 
fill in a structure with n cells by numbers from 1 to n: IIFRT54i for Young 
tableaux and BKnu981 (Sec.5.1.4, Ex.20) for heap-ordered trees. It is natu¬ 
ral to wonder whether there exists a hook formula for heap tableaux that 
provides a common generalization of both these results. 

Theorem gives a partial answer: not a formula but a lower bound. To 
state it, given (a, i) e dom{T), define the hook length Ha,i to be the cardinal 
of set {(/3,j) e dom{T) : [(j = z) A (a C /3)] V [{j > i) A (a = jd)]}. For 
example, Fig.|^c). displays the hook lengths of cells in Ti. 


Theorem 6. Given k >2 and a k-shape S with nfree cells, the number of ways to 
create a heap tableau T with shape S by filling its cells with numbers {1,2,..., n} 








is at least ^ - - -tf—. The bound is twht for Youn^ tableaux IIFRT54'I, heav- 

LL(a ,i)£dom(T) O J O I 

ordered trees t\Knu98L and infinitely many other examples, but is also not tight 
for infinitely many (counter)examples. 

We leave open the issue whether one can tighten up the lower bound 
above to a formula by modifying the definition of the hook length Ha^i. 
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Figure 4: (a). Heap tableau Ti and its shape S(Ti) (in brackets) (b). The 
equivalent Young tableau-like representation of Ti and (c). The hook 
lengths. 


We can create fc-heap tableaux from integer sequences by a version of 
the Schensted procedure HSchOlll . Algorithm Schensted-HEAPfc below per¬ 
forms column insertions and gives to any bumped element k choices for 
insertion/bumping, the children of vector W, with addresses r • 

Theorem 7. The result of applying the Schensted-HEAPk procedure to an arbi¬ 
trary permutation X is indeed a k-ary heap tableau. 

Example 3. Suppose we start with Tifrom Tig. ^a). Then (Fig. 9 is appended 
to vector Vx. 7 arrives, bumping 8, which in turn bumps 11. Finally 11 starts a 
new vector at position 00. Modified cells are grayed. 

Procedure Schensted-HEAP^ does not help in computing the longest 
heapable subsequence: The complexity of computing this parameter is 
open IBHMZlTI , and we make no progress on this issue. On the other 
hand, we can give ak >2 version of the R-S correspondence: 

Theorem 8. For every k >2 there exists a bijection between permutations it g 
Sn and pairs (P, Q) ofk-heap tableaux with n elements and identical shape, where 
Q is a standard tableau. 

Condition "Q is standard" is specific to case k > 2: heaps simply 
have "too many degrees of freedom" between siblings. Schensted-HEAP^ 
solves this problem by starting new vectors from left to right and top to 
bottom. 













































Figure 5: Inserting 9 and 7 into Ti. 


Algorithm 6.1: SCHENSTED-HEAPfc(A = xq, ..., Xn-i) 
for i in range(n) : BUMP{xi, A) 

PROCEDURE BUMP(a;, S) : #5 is a set ofadresses. 

- Attempt to append x to some Vr,r e S (perhaps creating it) 
(choose the first r where appending x keeps 14 - increasing), 
if (this is not possible for any vector 14-, r G S) : 

- Let be the set of elements of value > x, 
in all vectors Vr,r e S (clearly B^ ^ 0) 

- Let y = min{Bx] and r the address of its vector. 

- Replace yhy x into 14- 

- BUMP{y, r ■ E^.) jj^humip y into some child ofr 


7 Conclusion and Acknowledgments 


Our paper raises a large number of open issues. We briefly list a few: 
Rigorously justify Conjecture [T| Study process HADk and its variants 
IMon97[ ICG05I . Recoimect the theory to the analysis of secretary problems 
I AM091 rBKK+091 . Find the distribution of MHSklir]. Obtain a hook for¬ 
mula. Define a version of Young tableaux related to process HADk- 

We plan to address some of these in subsequent work. The most im¬ 
portant open problem, however, is the complexity of computing LHS. 

This research has been supported by CNCS IDEI Grant PN-II-ID-PCE- 
2011-3-0981 "Structure and computational difficulty in combinatorial op¬ 
timization: an interdisciplinary approach". 
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Appendix 

7.1 Proof of Theorem |2] 

1. For k >2, consider the sequence X = [1, /c 4-1, fc, /c — 1, ■ • ■ ,2]. 


Lemma 2. We have 


MHSi{X) = k,MHS2{X) = k-l,...,MHSk{X) = 1. 

Proof. Applying the Greedy algorithm we obtain the following heap 
decompositions: 

- MHSi(X) = k; i/i = [I,k + 11H2 = [k],H3 = [k - 1], ..., 

Hk = [ 2 ]. 

- MHS2(X) = k - 1 : H, = [l,k + l,k], H2 = [k - 1], H3 = [k- 2], 
• • • / Hk-i = [2]. 


- MHSi(X) = k-i + l: Hi = [l,k + l,k,...,k - i + H2 = 
[k - i + 1], Hs = [k - i],..., Hk-i+i = [k + 2], 


- MHSk(X) = 1: Hi = [l,k + l,k,--- ,2]. 

□ 

□ 


2. Let k,n >2. Define sequence 

XXn) ^ [p 

{2 + {k-l)),k,...,2, 

(3 + 2{k -l) + {k- 1)2), {k + e), ...,{2 + k), 

n n—1 

^(n + l-i){k- 1)*,..., 1 + ^(n - i){k - 1)*] 


in other words = [1, Xi, X2,..., X„], where for each 1 < t < n 

the subsequence Xt is Xt = [Yll=Qit + l-i){k-l)\ l + X^-Io(^- 

i){k — 1)*]. Xt has {k — 1)* + {k — 1)*+^ + ... + 1 = many 

elements. 

We can see that this sequence is /c-heapable, thus MHSk{X) = 1: 
\Xt\ = (/c —l)|Xt_i| + l < /cIXiI, and every number in XAs larger than 



every number in Xt-i- Thus we can arrange the X/s on (incomplete) 
heap levels, with every node in a child of some node in Xt_i. 

Theorem 9. We have 


= n + 1. 


Proof. We apply the GREEDY algorithm. After sequence Xi two {k — 
l)-heaps are created. Hi has two full levels, H 2 contains only the root 
2. Sequence X 2 has length (k — 1)^ elements go on the third level 
of ETi. k — 1 elements go on the second level of X 2 . The remaining 
k"^ — {k — 1)^ — 2{k — 1) = 1 element starts a new heap H^,. 

By induction we easily prove the following 

Lemma 3. For every t>l, the elements of Xt go via GREEDY 

as follows: 

- [k — 1)* of them go on level t of Hi, 

- [k — 1)*“^ of them go on level t — 1 of H 2 , 


- k — I of them go on the first level of Hf 

The remaining — 1)* = 1 element starts a new heap 

Ht+i. 


□ 

□ 


7.2 Proof of Theorem [5] 

First sequence: Define a„ to be the expected cardinality of the multiset of 
slots (particles lifelines in process HAD 2 )) at moment n. Clearly a^/n = 
2l{n) — c{n). Also, given Z = (Zq, Zi,..., Z„_i) a finite trajectory in [0,1] 
and an initial set of slots T, denote by s{Z;T) the multiset of particles 
(slots) added during Z that are still alive at the end of the trajectory Z, if 
at time t = 0 the process started with the slots in T (omitting the second 




argument if T = 0), and a{Z;T) = \s{Z; T)\. Finally denote by v{Z;T) the 
submultiset of s{Z;T) consisting of elements with multiplicity two, and 

by/(Z;T) = |^(Z;T)|. 

Subadditivity of a„ will follow from the fact that the property holds on 
each trajectory: If X = (Xq, ..., X„_i) and Ym = (X„ ... X^+^-i) then in fact 
we can show that 

a{XYj<a{X) + a{Y^). (3) 

Clearly = E\x\=n[a{X)] so implies that a„ is subadditive. It turns out 
that, together with (|^, we will need to simultaneously prove that 

siYjn) Y s{Yjn; s{X)) (as multisets) (4) 

We prove Q and Q by induction on m = \Ym\- Clearly the inclusion 
is true if m = 0. Let Ym = Ym-iXn+m-i and s{XYm) = U with 
Wm = s{X) n s{XY^), Z,^ = Y^n s{XYj. 

s{XYm) modifies s(XXm-i) by adding two copies of Xn+m-i to IWi and, 
perhaps, erasing some pm, the largest element (if any) in s{XYm-i) smaller 
or equal to X„+„i_i. Thus a(XXm) - a{XYm-i) G {1, 2}. 

Similarly, s{Ym) modifies s{Ym-i) by adding two copies of Xn+m-i and, 
perhaps, erasing some r^, the largest element (if any) in s{Yjn-i) smaller 
or equal to Xn+m-i- Thus a{Ym) - a{Ym-i) E {1, 2}. 

All that remains in order to prove that a{XYm) — a{Ym) < a(XXm-i) — 
a(Xm-i) (and thus establish inequality 0 inductively for m as well) is that 
{a{Ym) — a{Ym-i) = 1) ^ {a{XYm) — a{XYm-i) = 1). This follows easily 
from inductive hypothesis Q for m — 1: if a{Ym) — a(Xm-i) = 1 then some 
element in s(lA-i) is less or equal to Xn+m-i- The same must be true for 
siYra-i] s(A)) and hence for s{XYm-i) as well (noting, though, thatp^ rnay 
well be an element of X). Now we have to show that Q also remains true: 
clearly the newly added element, Xn+m-i, has multiplicity two in both 
s{Yra) and s{Ym] s(X)). Suppose we erase some element from s{Ym-i)- 
Then belongs to s{Ym-i., s(X)), has multiplicity at least one there, and 
is the largest element smaller or equal to X^+^-i in s(LA-i; 'S(X)) n s(Xm-i)- 
Thus, when going from s(Xm-i;s(X)) to s(Xm;s(X)) we either erase one 
copy of Pm or do not erase nothing (perhaps we erased some element in 
s(X), which is not, however, in s{Ym-i, s(X))) Suppose, on the other hand 
that no element in s(lA-i) is smaller or equal to Xn+m-i- There may be 
such an erased element Pm in s(lA-i; s(X)), but it certainly did not belong to 
s{Ym-i)- In both cases we infer that relation s{Ym) Y s{Ym'-, ■s{X)) is true. 



Second sequence: 

The proof is very similar to the first one: Define, in a setting similar 
to that of the first sequence, u{X, T) to be the cardinality of the submul¬ 
tiset of s{Z,T) of elements with multiplicity two. Define an to be the 
expected number of elements with multiplicity two at stage n. That is, 
an = E\x\=n[u{X)] = l{n) — c{n). We will prove by induction on m that if 
X (^ 0 ) • • •) Xn—i') and Ym (W„ ... Xn-\-m—i^ then 

u{XYm)<u{X)+u{Ym). ( 5 ) 

The result is clear for m = 0. In the general case, m > 1, v{XYm) 
modifies v{XYm-i) by adding Xn+m-i and, perhaps, erasing some pm, the 
largest element (if any) in s{XYm-i) smaller or equal to Xn+m-i it this ele¬ 
ment is in v{XYm-i)- Thus u{XYm) — u{XYm-i) G {0,1}. Similarly, 'u(Dm) 
modifies u{Yrn-i) by adding Xn+m-i and, perhaps, erasing some r^, the 
largest element (if any) in s{Ym-i), if this element is smaller or equal to 
Xn+m-l- Thus u{Ym) - u{Ym-l) G { 0 , 1 }. 

If u{XYm-i) < u{X) - 1 + u{Ym) then clearly u{XYm) - u{Y„,) < u{X). 
The only problematic case may be when u{XYm-i) — u{Ym_i) = u{X), 
u{XYm) — u{XYni_i) = 1 , u{Ym) — m(Wi-i) = 0. But this means that 
exists (and is erased from r;(Wi-i))- Since s(Fm-i) ^ s(Fm-i; ■s(X)), 
must be erased from s{Ym-i., s{X)). In other words, the bad case above 
cannot occur. 


7.3 Proof of Theorem [6] 


We use essentially the classical proof based on the hook walk from iGNW79L 
slightly adapted to our framework: Define for a heap table T with n ele¬ 
ments 


Ft — 


n\ 


n 


(a,i)£dom{T) ^oi,i 


and C{T), the set of corners ofT, to be the set of cells {a, i) of T with Ha,i = 
1. Given 7 G C{T) define = T \ { 7 }. We want to prove that 


Ft 

1&C{T) 


> 1 . 


( 6 ) 


(of course, for k = 1 we can actually prove equality in Formula]^ above). 
This will ensure (by induction upon table size) the truth of our lower 
bound. 





• Choose (uniformly at random) a cell (ni, ii) of T. 

• let f = 1. 

• while ((oj, ti) is not a corner of T): 

• Choose (ofi+ijC+i) uniformly at random from H{{ai,ti)) \ 

• Let i = i + 1. 

• Return corner in). 

Figure 6: The hook walk. 


We need some more notation: for (a, i) G dom{T), denote 


Heapa^i = {(/3,t) G dom(T) : a C /?} 
the heap hook of {a, i), and by 

Veca,i = {(a, j) e dom{T) : i < j} 


its vector hook (thus Ha^i = \Heapa,i\ + \VeCaf — 1). 
By applying formulas for Ft, Ft^ we get 


Ft 

Ft 


1 
n 

j&Heapisj 



- 1 


n Hp,j 

H0.-1 

'Y&Vec^^j 


1 

n 


n ( 1 + 

'y&Heappj 



11(1 + 

'l&Vecp^j 



(7) 

( 8 ) 


(9) 


We consider the hook walk on T, defined in Figure Q. 

Interpret terms from the product in formula ^ as probabilities of paths 
in the hook walk, ending in corner 7, as follows: 

• Choose uniformly at random from T (i.e. with probability 

1 /n). 


Terms {(3,i) in the first product whose contribution is 
spond to cells where the walk makes "hook moves" towards 7. 




corre- 








• Terms (/3, i) in the second product whose contribution is ^ ^ cor- 
respond to cells where the walk makes "vector moves" towards 7 . 

Indeed, consider a path P ; (a,i) := — )■ ( 02 , * 2 ) (onPn) = 

7 . Define its hook projection to be set A = {oi, 02 , • • •, «n} and its vector 
projection to be the set B = {ii, * 2 , • • •, in}- 

Just as in nGNW79i , given set of words A = {ui,... am}, with ai = a 
and ai C Ui+i and set of integers P = {P,..., P} with ii = i and ii < ii+i, 
the probability p{A^ B) that the hook walk has the hook(vector) projections 
A{B) (thus starting at (ui, zi)) is 


p(A,B)< n (1+ 


H, 




-)■ n ( 1 + 




Ha i ~ 1 


) ( 10 ) 


Indeed, as in IIGNW79i 


P{A,B) = 


Ha\,i\ 1 


< 


Hai,ii 1 


[P{A-{a,},B) + P{A,B-{t,})] < 

- 1 ) + - 1 )] ■ (RHS) 


( 11 ) 


where (RHS) is the right-hand side product in equation (101, and in the 
second row we used the inductive hypothesis. 

For fc = 1, in IIGNW791 we would use an equality of type Pqi ,7 — 1 = 
- 1) + - !)• For k > 2 such an equality is no longer true, and 

we only have inequality 


H, 


ai,zi 


1 > ~ 1 ) + {Ham,ii ~ 1 ) 


( 12 ) 


leading to a proof of equation ( [10) >. 

To justify inequality (12), note that, by property (b) of heap tableaux, 
since Ui IZ am, 

\Vec{amAi)\ < |17ec(ai,zi)| (13) 

On the other hand 


|Peap(ai,zi)| > \Heap{aiAr)\ + {\Heap{amAi)\ - !)• (14) 

This is true by monotonicity property (c) of heap tableaux; every path 
present in the heap Plr rooted at (ai,Zr) is also present in the heap Pi 













rooted at (oi, ii). Heap Hr is empty below node 7 = ( 0 ^, v)/ but Hi con¬ 
tains the subheap rooted at (ui, v) (of size \Heap{ai,ir)\ — l) any maybe some 
other subheaps, rooted at nodes w E Hi whose correspondent in Hr has no 
descendents. Summing up equations ( [T^ and ( [T4) > we get our desired in¬ 
equality ( [l^ . Example in Figure shows that inequality ( [T^ can be strict: 
The hook length of Hi x — 1 = 7 but 7^27 — 1 = 2 — 1 and Hi q — 1 = 2 — 1. 
The reason is that the grayed cells are not counted in the hook of (1,0), buf 
they belong to the hook of (1, A). 


A 0 1 ■■■ 12. . .13.. .14. ..^5 

1 
2 


5T4T3T2T1 


Figure 7: Example showing that inequality ( [T^ is strict. 

Finally, adding up suitable inequalities ( (T 0 ] > we infer that s^, the proba¬ 
bility that the walk ends up at 7, equal to 

1 


S<y - - 

n 


Y.p(A,B) 


(the sum being over all suitable sets A, B) is less or equal than the expan- 
sion (91 of -jA. Since the sum of probabilities adds up to 1, inequality ( 6 ) 
follows. 

Fet us now deal with examples/counterexamples. 

First we present a set of arbitrarily large heap tableaux, different from 
both heap-ordered trees and Young tableaux, for which the hook inequal¬ 
ity is tight: for r > 2, /c > 1 consider heap table Tr^k (Fig- |^a)) to have 
n = Sk,r + k — l nodes, distributed in a complete k-ary tree Hi with r levels 
0 , 1 ,... r — 1 and Sk,r nodes, and then k — 1 one-element heaps H 2 ,..., Hk- 
We employ notation 


Sk,i — 1 k k^ ^ — 


k’' 


k 


The number of ways fo fill up such a heap tableau is (^_^) ■ Nk^rr where 
Nk,r is the number of ways to fill up a complete k-ary tree with r levels. 


Nk,r = 




fc,r • 



























Figure 8 : (a). Example T 3 3 . (b). Counterexample W 4 . The hook formula is 
tight tor heap tableau (a), but not tight for (b). In both cases cell contents 
represent the hook lengths. 


This happens because for every subset A of {2,..., n} of cardinality k — 1, 
element 1 together with those not in A can be distributed in Hi in Nk^r 
ways. 

Putting all things together, the total number of fillings of ^ is 

{Sk,r + k- 2 )\-{Sk,ry. _ (n- 1 )! 

{k - 1 )! ■ {Sk,r - 1)! ■ Sk,r ■ {k - 1 )! ' 

Hook lengths are 1,2,..., k—1 (for the nodes in the one-element heaps), 
{Sk,r-i)^' (for the non-root nodes in Hi) and n (for the root node of Hi). The 
resulting formula 


is the same as the total number computed above. 

Now for the counterexamples: consider heap tableaux Wr (Fig. |^b), 
identical to the heap tableau in Fig. defined as follows: IT,, consists 
of fwo heaps. Hi with cells with addresses (1, A), (1,0), (1,1), (1, 11 ),..., 
( 1 ,1^^“^), and H 2 with cells with addresses ( 2 , A), ( 2 ,0). IT,, has n = 2 r -|-1 
nodes. 

Hook values of cells in Hi are 2r, 2,2r — 3,2r — 4,..., 1. Hook values of 








































cells in H 2 are 2, 1, respectively. Thus the hook formula predicts 

(2r + 1)! _ (2r + l)(2r - l)(2r - 2) 

2 ■ 2 ■ 2r ■ (2r - 3) ■ (2r - 4) ■ ... ■ 1 “ 4 

ways to fill up the table. If r is even then the number above is not an 
integer, so the hook formula caimot be exact for these tableaux. 

7.4 Proof of Theorem 0 

We prove that inserting a single integer element x into a heap tableau T 
results in another heap tableau T ■(— x. Therefore inserting a permutation 
X will result in a heap tableau. 

By construction, when an element is appended to a vector, the vector 
remains increasing. Also, if an element y bumps another element 2 ; from 
a vector V (presumed nondecreasing) then is the smallest such element 
in V greater than y. Thus, replacing zhy y preserves the nondecreasing 
nature of the vector V. 

All we need to verify is that min-heap invariant (b) (initially true for 
the one-element heap tableau) also remains true when inserting a new el¬ 
ement X. 

The case when x is appended to 14 is clear: since invariant (b) was 
true before inserting x for every address r we have |14| > IKI- See the 
example above when we append x = 9. Thus what we are doing, in effect, 
by appending x to I 4 is start a new heap. 

Suppose instead that inserting x bumps element xi from 14- Necessar¬ 
ily X < xi. Suppose i is the position of xi in 14/ thaf is xi was the root of 
heap Hi. By reducing the value of the root, the heap Hi still verifies the 
min-heap invariant. Now suppose xi bumps element X 2 . We claim that 
X2 has rank at most i in its vector. Indeed, the element with rank i in the 
vector of X2 was larger than xi (by the min-heap property of Hi). So X2 
must have had rank at most i. Let j be this rank. 

Since xi < X 2 , by replacing X 2 by xi the min-heap property is satisfied 
"below X2/X1". It is satisfied "above X 2 " as well, since the parent of X2 
either was (and still is) the root of Hj, a number less or equal to xi (in case 
j < i) or is X (in case the rank of X 2 is exactly i). 

If X2 bumps X 3 ,..., etc we repeat the argument above on the corre¬ 
sponding sub-min-heap tableau. 





Figure 9: Inserting x and the bumps it determines. 

Suppose, finally, that element bumped from I 4 by Xn-i, is appended 
to vector Vp. Let s be the index of Xn in 1/^. We claim fhat |14| > -s. 

Indeed, is larger than the first s — 1 elements of | Vg |. By the min-heap 
property, it is also larger than the initial s — 1 elements of | W | as well. So its 
index in W before getting bumped could not have been less than s. That 
means that appending Xn does not violate the min-heap invariant (b). 


7.5 Proof of Theorem [8] 

Given permutation a G Sn, denote by the heap table obtained by ap¬ 
plying the Schensted-HEAPfc algorithm. 

Define a second heap table Qa^ as follows: whenever we insert a{i) into 
P, we record the resulting sequence of bounces and insert i at the last 
place involved in the bounces. 

Example 4. Let k = 2 and consider the permutation ^^^4 26351 

The two corresponding heap tableaux are constructed below. For drawing conve¬ 
nience, during the insertion process they are not displayed in the heap-like form, 
but rather in the more compact Young-table equivalent format. The resulting 
heap-tableaux are displayed in Figure [l0| 




















Figure 10: (a). Heap-tableau Pa- (b). (Standard) heap-tableau Qa- 
There are two things to prove about the algorithm outlined above: 

(i) . For every permutation a, Q„ is a heap tableau of the same shape as 

heap tableau P„. Moreover, 

Lemma 4. Q„ is a heap tableau in standard form. 

(ii) . One can uniquely identify permutation a from the pair (P^., Q^). 

(i). The fact that the shape is the same is easy: whenever number a{i) 
is inserted into Pa, this table changes by exactly one (filled) position. 
When i is inserted into Qa, the position on which it is inserted is 
the unique position that was added to P^: the position of the final 
insertion after a (perhaps empty) sequence of bumps. Therefore the 
two heap tableaux have the same shape throughout the process, and 
at the end of it. 

Let us show now that Qa is a heap tableau. We will show that invari¬ 
ants (b),(c). remain true throughout the insertion process. 

They are, indeed, true at the beginning when Qa = [1]. Proving the 
heap invariant (b). is easy: numbers are inserted into Qa in the order 
1,2,... ,n. Each number is, therefore, larger than any number that 
is an ancestor in its heap. As each number i is inserted as a leaf 









in its corresponding heap, all heap conditions are still true after its 
insertion. 

The vector invariant (c). is equally easy: number i is appended to 
an old vector or starts a new one. The second case is trivial. In the 
first one i is the largest number inserted so far into Q„, therefore the 
largest in its vector. 

Finally, the fact that Qa is a standard tableau follows from the Algo¬ 
rithm: Schensted-HEAPfc starts a new vector from the leftmost posi¬ 
tion available. Therefore when it starts a new vector, its siblings to 
the left have acquired a smaller number, as they were already cre¬ 
ated before that point. Also, when it starts a new vector, all the vec¬ 
tors on the level immediately above have been created (otherwise 
Schensted-HEAPfc would have started a new vector there) and have, 
thus, acquired a smaller number. 

(ii). This is essentially the same proof ideea as that of the Robinson- 

Schensted correspondence for ordinary Young tableaux: given heap 
tableaux P, Q with the same shape we will recover the pairs (n, u(n)), 
(n—1, a{n — l)), ..., (1, o-(l)) in this backwards order by reversing the 
sequences of bumps. We will work in the more general setting when 
P contains n distinct numbers, not necessarily those from 1 to n. On 
the other hand, since Q is standard, Q will contain these numbers, 
each of them exactly once. 

The result is easily seen to be true for n = 1, n = 2. From now on we 
will assume that n > 3 and reason inductively 

Suppose n is in vector 14 of Q„. Then the insertion of a{n) into 
did not provoke any bumps. a{n) is the integer in vector 14 of P„ 
sitting in the same position as n does in Q^. Suppose, on the other 
hand, that n is in a different vector of Qa- Then n is the outcome of a 
series of bumps, caused by the insertion of a{n). 

Let X be the integer in P„ sitting at the same position as nir\Q„. Then 
X must have been bumped from the parent vector in the heap-table 
by some y. y is uniquely identified, as the largest element smaller 
than X in that vector. There must exist a smaller element in that vec¬ 
tor by the heap invariant, so y is well-defined. Now y must have 
been in turn bumped by some 2 : in the parent vector. We identify 2 : 
going upwards, until we reach vector 14/ identifying element a{n). 



Example 5. Consider, for example the case ofn = 6 in Figure 10 Element 
2 in P„ (sitting in the corresponding position) must have been bumped by 
1 in the top row. Therefore a(6) = 1. 


Now we delete a(i), i from the two heap tableaux and proceed induc¬ 
tively, until we are left with two tables with one element, identifying 
permutation a this way 

What allows us to employ the induction hypothesis is the following 

Lemma 5. Removing the largest element nfrom a standard heap tableau T 
yields another standard heap tableau. 

Proof. Suppose n is in a vector of length at least two. Clearly, by re¬ 
moving n all the vectors in the heap remain the same, so the resulting 
table is standard. 

Suppose, therefore, that n is the only element in a vector Vg of T, 
13 = zb, b ^ Sfc. Since T was standard, all the left sibling vectors Wa 
of C (a e Sfc, a < b) are nonempty, and all the vectors on previous 
levels of T are nonempty. 

Removing V preserves these properties (its leftmost sibling becomes 
the last vector, or the level disappears completely). 

□ 


□Completing the proof of Lemma also completes the proof of 
Theorem m 


□ 



